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ABSTRACT 

There are groups of genes that need coordinated 
repression in multiple contexts, for example if they 
code for proteins that work together in a pathway 
or in a protein complex. Redundancy of biological 
regulatory networks implies that such coordinated 
repression might occur at both the pre- and post- 
transcriptional level, though not necessarily simul- 
taneously or under the same conditions. Here, we 
propose that such redundancy in the global regula- 
tory network can be detected by the overlap between 
the putative targets of a transcriptional repressor, as 
identified by a ChlP-seq experiment, and predicted 
targets of a microRNA (miRNA). To test this hypoth- 
esis, we used publicly available ChlP-seq data of the 
neural transcriptional repressor RE1 silencing tran- 
scription factor (REST) from 15 different cell sam- 
ples. We found 20 miRNAs, each of which shares a 
significant amount of predicted targets with REST. 
The set of predicted associations between these 20 
miRNAs and the overlapping REST targets is en- 
riched in known miRNA targets. Many of the detected 
miRNAs have functions related to neural identity and 
glioblastoma, which could be expected from their 
overlap in targets with REST. We propose that the in- 
tegration of experimentally determined transcription 
factor binding sites with miRNA-target predictions 
provides functional information on miRNAs. 

INTRODUCTION 

Cell control requires the interplay of a molecular network 
acting at multiple biological levels, of which transcriptional 
regulation of gene expression is possibly the most impor- 
tant one. Accordingly, most efforts to understand cell con- 
trol have been focused on transcription factors (TFs). 

The discovery of microRNAs (miRNAs), however, 
brought the post-transcriptional regulation of gene expres- 
sion to the limelight. miRNAs are an RNA species of about 



23 nt that bind to cw-regulatory elements in target mR- 
NAs and help to tune their expression pattern by repress- 
ing translation or destabilizing the mRN A ((1), see (2) and 
references therein). miRNAs seem to be very abundant (3) 
and the number of experimentally known ones is increasing 
steadily (there are currently 2577 mature human miRNAs 
in miRBase release 20 (4)). 

However, the small molecular size and mode of action 
of miRNAs make the study of their function more difficult 
than that of TFs. miRNA regulatory activity seems to be 
more subtle than that of TFs: while knockdowns of TFs of- 
ten cause severe and detectable phenotypes (5), this is rarely 
the case for miRNAs (6). There may be multiple reasons 
for this: miRNAs only have an impact on the regulation of 
genes that have already been transcribed, therefore acting 
further downstream than TFs; co-regulation of the same 
transcript by multiple miRNAs is common and therefore 
the effect of loss of one miRNA might not have a big impact 
(6,7); the same miRNA usually regulates genes involved in a 
wide range of processes (1). As a result, over-expression or 
knockout of miRNAs is not as helpful for the verification of 
their function as for protein coding genes. For this reason, 
miRNA functional characterization is highly dependent on 
the identification and analysis of their target genes (8) and 
this has driven the development of computational methods 
for the prediction of miRNAs and their transcript targets. 
However, the accuracy of such methods is still low and they 
result in too many predictions (false positive rates from 24 
to 70% (9)). Strategies to filter large lists of candidate tar- 
gets should help to direct experimental efforts to the most 
likely and relevant miRNA-target interactions. 

Here, we propose to use the increasingly large body of ex- 
perimental knowledge on transcriptional regulation to im- 
prove the precision of miRNA target predictions. Our hy- 
pothesis is that there are groups of genes whose expression 
needs to be coordinately repressed in multiple contexts be- 
cause they are functionally related (for example, if they code 
for the members of a pathway). Such repression might oc- 
cur at different regulatory levels, for example via a transcrip- 
tional repressor or by a miRN A. We note that these multiple 
mechanisms of repression may generally occur in unrelated 
tissues, cell types or developmental stages: a given group of 
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Figure 1. Description of our approach. We collected predictions of 
miRNA binding sites (green and purple boxes, for example for miR-A 
and miR-B, respectively) in 3'UTRs of all human genes (represented in 
the black box on the left). The subset of those belonging to genes identi- 
fied as bound by a given transcriptional repressor (e.g. REST) in a ChlP- 
seq experiment (blue box on the left) are selected for analysis. Enrichment 
analysis (large arrow; see the Materials and Methods section) identifies that 
miR-A binding sites are enriched in the subset whereas miR-B binding sites 
are not. As a result we filter the set of predicted miR-A binding sites on the 
REST targets (box on the right). 



genes could be repressed by a transcriptional repressor in 
one cell type and by a miRNA in another. 

Therefore, given a list of genes targeted by a transcrip- 
tional repressor in one cell sample, this list can be examined 
to find out whether a specific miRNA is predicted to tar- 
get a significant fraction of these genes. This finding could 
be taken as an indication that the corresponding miRNA- 
target predictions are biologically relevant, although it will 
not inform us of where their regulatory effect will take place. 
A graphical explanation of our approach is depicted in Fig- 
ure 1. 

Technical developments in high-throughput measure- 
ment of TF binding support our approach. The ChlP-seq 
technique, a combination of chromatin immunoprecipita- 
tion (ChIP) and massively parallel DNA sequencing, al- 
lows the identification of interactions between proteins and 
DNA (10). Due to fast progress in this field, nowadays an in- 
creasing number of ChlP-seq datasets are available on pub- 
lic platforms. Lists of potential target genes of certain TFs 
can be generated by identifying and annotating the genes 
close to the binding sites found. 

Here, we present the application of our approach using 
15 datasets measuring the binding sites of the repressor 
RE1 silencing transcription factor (REST; also known as 
neuron-restrictive silencer factor, NRSF) in 1 5 different cell 
types (Table 1). REST has a wide spectrum of activities re- 
lated to the fine-tuning of neuronal gene expression both in 
neural tissue (e.g. neuronal progenitor cells (11) and adult 
brain (12)) and in non-neural cell types like Jurkat T cells 
where REST down-regulates neural genes (13). But it has 
also been found to regulate a wide range of non-neural tar- 
gets (10,14). 



MATERIALS AND METHODS 

Datasets 

The high-throughput datasets reporting genomic bind- 
ing sites for the transcriptional repressor REST and for 
other factors used in this study were generated and pub- 
lished by the ENCODE project and can be accessed from 
the link http://hgdownload.cse.ucsc.edu/goldenPath/hgl9/ 
encodeDCC/wgEncodeHaibTfbs/ (15). 

Assigning binding site locations to genes and miRNAs from 
ChlP-seqdata 

Peak locations from ChlP-seq data were assigned to RefSeq 
genes of the reference genome hgl9 according to prioritized 
criteria: if the peaks were situated (i) in their known or pre- 
dicted promoter region (according to a database on human 
and murine promoters: MPromDb (16)); (ii) up to 1000 bp 
upstream of their transcription start site (TSS); (iii-a) for 
a single exon gene, up to 1000 bp downstream of the TSS; 
(iii-b) for a multi-exon gene, anywhere between the TSS to 
coding start plus first intron of a gene; (iv) up to 5000 bp 
upstream of their TSS; and (v) up to 5000 bp downstream 
of their transcription end site. This method stems from the 
observation that the highest frequency of binding sites for 
certain TFs can be found mainly in the first intron or in 
the core promoter region (17). If multiple replicates of the 
ChlP-seq experiments were available, a gene or miRNA was 
considered to be potentially regulated by the factor if it was 
identified in at least two replicates. REST was considered to 
bind to a miRNA-gene if a peak was found up to 10 000 bp 
up- or downstream of the miRNA sequence (miRNA posi- 
tions from miRBase, release 16). 

miRNA binding site predictions 

Predictions of miRNA binding sites for all annotated hu- 
man 3'-UTRs, according to the UCSC genome database, 
were obtained from TargetScanHuman 6.2 (http://www. 
TargetScan.org/ (18)). The dataset comprised miRNA bind- 
ing site predictions in the 3'UTRs of human genes after 
pooling the predictions for all variants associated with each 
gene. The predictions corresponded to 4 492 024 unique 
miRNA-gene pairs. To ensure higher accuracy (19), only 
conserved miRNA binding sites across vertebrates and 
broadly conserved miRNA families were used for the analy- 
sis, resulting in 72 770 unique miRNA-gene pairs for 1 1 161 
genes. 

Calculating over-representation of miRNA binding sites in 
3'UTRs of factor-bound genes 

miRNA binding site over-representation for targets of a 
miRNA in sets of factor-bound genes (e.g. REST-bound 
genes) was computed in the following way (see Figure 2). 
Genes with miRNA target predictions in TargetScanHu- 
man were taken. Given n number of factor-bound genes, 
and given that m A of them are predicted to be target of 
a given miRNA family miR-A, we took n genes at ran- 
dom from the background of all TargetScanHuman genes 
with predicted miRNA targets 10 000 times and counted 
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Table 1. miRNA families with predicted binding sites significantly enriched in the 3'UTRs of REST target genes (FDRs < 0.1) 



miRNA family 


Times 
found 


ECC1 GM12878 


H1-hESC 


H1 -neurons 


HCT-116 


HeLa-S3 HepG2 


HL-60 


K562 


MCF-7 


PANC-1 


PFSK-1 


SK-N-SH 


U87 


miR-101/101ab 


1 






- 


0.058 


- 


- 


- 


- 


- 


- 


- 


- 


- 


miR-129-5p/129ab- 
5p 


8 


- 


0.008 


0.054 


0.015 




0.031 0.031 




0.010 




0.092 






0.015 


miR-132/212/212-3p 


1 








0.078 




















miR-138/138ab 


6 






0.070 




0.096 


0.097 




0.010 




0.097 






0.015 


miR-139-5p 


1 








0.022 




















miR-153* 


12 


0.077 


0.008 


0.005 


0.010 


0.008 


0.094 0.036 




0.015 


0.008 


0.005 




0.015 


0.015 


miR-185 family 


8 


0.061 


0.058 


0.096 




0.005 


0.008 






0.004 


0.005 






0.048 


miR-190/190ab 


1 






0.079 






















miR-208 family" 


1 








0.023 




















miR-217 


1 








0.010 




















miR-218/218a 


10 


0.054 


0.005 


0.005 




0.005 


0.073 


0.069 




0.004 


0.086 


0.065 




0.008 


miR-300/381/539-3p 


2 








0.019 
















0.082 




miR-326/330/330-5p 


1 


- 


- 






















0.048 


miR-329 family 


4 






0.079 




0.052 








0.021 


0.092 








miR-34 family 


1 


























0.086 


miR-374ab 


1 












0.061 
















miR-421 


2 




0.071 






0.064 


















miR-448/448-3p* 


13 


0.015 


0.008 


0.005 


0.022 


0.005 


0.008 0.015 


0.015 


0.019 


0.008 


0.005 




0.015 


0.015 


miR-499-5p** 


1 








0.018 




















miR-543 


2 








0.064 














0.082 







Neural cell lines are highlighted in gray. No significant results were found for cell line A549 (not included). 
These miRNA families have overlapping seeds (non independent results, see text for details). 
**These miRNA families have overlapping seeds (non independent results, see text for details). 



the targets of this miRNA family (z A ). To account for the 
fact that factor-bound genes might have a higher tendency 
to bear predicted miRNA targets (e.g. because they would 
have longer 3'UTRs), we computed a factor (r) to correct 
the values of z A . Given the set of « factor-bound genes, we 
counted the total number of miRNA to gene associations 
m, in the set for all 153 miRNA families considered in Tar- 
getScanHuman. For every random set the same was done 
to obtain z t . We multiplied the number of targets of the re- 
spective miRNA family, z A , by the ratio r between m t and 
z,, resulting in a corrected value z A *. Then, we counted how 
many times z A * was smaller than m A . The sum of success- 
ful tests divided by the number of randomizations (10 000) 
was taken as a /rvalue of miR-A enrichment in the target 
genes. The />-value was corrected for multiple testing using 
the Benjamini and Hochberg method to yield a false discov- 
ery rate (FDR). 



Factor r is larger than 1 if the factor-bound genes have a 
tendency to have more predicted miRNA targets than the 
background. In general, we observed values of r between 1 
and 1.47 for all REST ChlP-seq datasets. 



Calculating over-representation of REST-bound genes in 
miRNA family targets 

For each miRNA family with n target genes (m of them be- 
ing REST targets) we picked at random n genes from the 
TargetScanHuman list 10 000 times and counted how many 
times we observed >m REST targets. The counts were taken 
as /"-values of over-representation of REST-bound genes in 
the miRNA family targets, which were then Bonferroni cor- 
rected for multiple testing. We considered as REST targets 
the sum of genes bound by REST in our ChlP-seq datasets. 
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10 000 times: 
n random genes 



n REST-bound genes 



72 770 TargetScanHuman 
miRNA-target gene pairs 



V 

z, = #miRNA-target gene pairs 



V 

z A = #targets 
of miR- A 



T 



r = m, / z. 



z» = Za ■ r 



v 

m t = #miRNA-target gene pairs 



m A = #targets of miR-A 



J 



p-value = 



t imes TRUE 
10 000 



Figure 2. Illustration of the calculation of over-representation of a 
miRNA family: p-value calculation by 10 000 random tests for REST tar- 
get genes. 



miRNA expression in tissues 

We used the atlas on mammalian miRNA expression (20) to 
define miRNA tissue expression. This atlas contains clone 
counts obtained by small RNA library sequencing. We fo- 
cused on miRNAs that had a total of at least 10 copies de- 
tected over all tissues in the atlas and were present in Tar- 
getScanHuman. A miRNA was considered to be expressed 
in a cell type when the relative cloning frequency exceeded 
3% of the total clone counts over all tissues. Results were 
pooled for miRNA families. 

Significance of target gene enrichment 

At the time of our analysis the largest collection of ex- 
perimentally validated target genes of miRNAs was found 
in the TarBase 6.0 database containing 38 384 miRNA 
target-gene pairs for human, of which 3077 were predicted 
by TargetScanHuman 6.2 (21). The proportion of vali- 
dated miRNA-target pairs in all 72 770 TargetScan pre- 
dictions was calculated for over-represented miRNA fami- 
lies, if available, and contrasted with the proportion of valid 
pairs in the filtered subset (Supplementary Table S3) for 
each family separately and for all miRNA families with 
more than 10 validated interactions in TargetScanHuman 
6.2. 

Presence of paired miRNA binding sites 

miRNA binding site positions were extracted from the Tar- 
getScanHuman 6.2 dataset. We compared the amount of 
miRNA binding sites that can be found in close proxim- 
ity to another binding site (dual sites, 8^10 nucleotides), 
from the filtered set of genes bound by REST in the ChlP- 
seq experiments, to all remaining genes from the dataset as 
background. On average the filtered set had a higher num- 
ber of miRNA binding sites than the background; therefore 



we classified all genes according to their number of binding 
sites in the 3'UTR. All classes with 2-26 binding sites were 
included in the analysis. Then, a set of n genes composed as 
a sum of Till genes for each class i = 2, . . ., 26 was obtained 
both for the filtered set and for the background. The num- 
ber of dual sites and the number of non-dual sites in each set 
of genes were compared by Fisher's exact test. We repeated 
the test 1000 times. The resulting p- values were corrected for 
multiple testing using the Benjamini and Hochberg method. 



Statistics 

The luciferase data were analyzed using a t-test. All further 
^-values except for those generated by simulation were com- 
puted using Fisher's exact test. 



Cloning 

We aimed to study the impact of miRNA-448 on the 
UTR-dependent regulation of the PIK3R1 gene. To this 
end, we cloned regions of the human PIK3R1 3'UTR using 
the following PCR primers with restriction site overhangs 
for Region A (chr5: 67593865-67593884: Region-A-Fw: 
AGTActcgagGCCTGGTTTAGCCTGGATGT, Region- 
A-Rv GATgcggccgcCCCACCACCCCACTTGATAC) and 
for Region B (chr5: 67595300-67595319: Region-B-Fw: 
GTCTctcgagTAGGGCAGGAGTGAGAGGTC, Region- 
B-Rv: TGAgcggccgcAAAACGACAAATGCGGTGGG). 
Regions A and B were inserted in the multiple cloning site 
(MCS) of a modified version of the psiCHECK2 vector 
(Promega) containing two separate luciferase genes (renilla 
and firefly) under control of constitutive promoters. The 
MCS was located at the 3' end of the renilla luciferase gene, 
i.e. changes in the renilla luciferase activity are controlled 
by the inserted UTR. The firefly luciferase was used to 
normalize the measurement. Region B contains a putative 
binding site for the miRNA-448 at its 5' end (GATTTA- 
GATATGCAAAAGCTGG). This region was deleted by 
shortening the Region B using a PflMI/XhoI digest fol- 
lowed by Klenow blunt end filling and re-ligation yielding 
a vector containing a modified Region B, namely B-mut. 
All positions refer to human assembly NCBI37/hgl9 
(February 2009). 



Transfection and stimulation 

HEK293 cells were seeded in 6-well plates in Dulbecco's 
modified Eagle's medium (DMEM) medium (10 6 cells per 
well) and transfected with the reporter plasmids (1 |xg 
DNA/per well) using 3 |xl Roti®-Fect (Carl Roth, POO 1.4) 
per well at 80% confluence. Twenty-four hours post trans- 
fection, the cells were washed with equilibrated PBS and 
trypsinized for 4 min at 37°C. Cells were removed, washed 
and centrifuged and re-suspended in DMEM medium and 
seeded in 96-well plates that were previously prepared for 
reverse transfection according to manufacturer's recom- 
mendation. The plates contained 10 pmol miRNA-448 per 
well (Invitrogen, hsa-miR-448, Assay-ID: MC10520), 0.3 
\i\ Lipofectamine® RNAiMAX per well (Life Technolo- 
gies, Cat. 13778030) and 18 |xl Opti-MEM® Medi um 



5440 Nucleic Acids Research, 2014, Vol. 42, No. 9 



(Life Technologies, Cat. 11058021) per well. Controls con- 
tained the same amount of Lipofectamine® RNAiMAX 
and Opti-MEM® Medium accordingly. 

Dual reporter assay 

The reporter assay was performed 24 h after miRNA trans- 
fection {n = 6). The supernatant was removed from the 
cells and 20 u,l passive lysis buffer (Promega, Cat. E1941) 
was added. After 20 min incubation at 20°C, the dual assay 
was performed according to the Hampf and Gossen Proto- 
col (22) using 100 julI of the Firefly buffer (Tricine 20 mM, 
MgS0 4 2.67 mM, EDTA 100 |xM, ATP 530 |xM, DTT 33.3 
mM, Coenzyme A 270 uM, D-Luciferin 470 u,M, pH 7.8) 
and 100 julI of the Renilla buffer (NaCl 1.1 M, K2HP0 4 
220 mM, Na-EDTA 2.2 mM, BSA 6.58 mM, coelenterazine 
1.43 |xM, pH 5.1). The 96-well plates were measured using 
the Luminoskan luminometer (LabSystems). We used auto- 
mated injection of the buffers and light measurement using 
an in-house developed remote control for the Luminoskan 
luminometer. All data are reported as the ratio of the rela- 
tive light units of the Renilla/Fireny measurements rescaled 
in a way that the mean of control of Region B is equal to 1 .0 
in order to improve readability. 

RESULTS 

To study the overlap between experimentally verified tar- 
gets of the transcriptional repressor REST and pre- 
dicted miRNA targets we used genome-wide experimental 
datasets of DNA binding sites for REST from ChlP-seq 
experiments on 15 human cell lines. From each of these 
datasets we obtained a list of genes likely to be regulated 
by REST according to their close proximity to REST bind- 
ing sites (see the Materials and Methods section for details). 
The size of each gene list as well as a short description of the 
cell lines can be found in Supplementary Table SI. 

Several miRNAs have over-represented targets in sets of 
REST-bound genes 

For each of the sets of potential REST targets we computed 
the significance of over-representation of targets of partic- 
ular miRNAs (defined by the TargetScanHuman database, 
version 6.2 (18)) as compared to random sets of genes by 
means of /j-values (or FDRs) (Supplementary Table S2; 
see the Materials and Methods section). We could do this 
for a total of 153 miRNA families (broadly conserved or 
conserved according to TargetScanHuman definitions) for 
which targets were identified in the 3'-UTRs of the REST- 
bound genes (see the Materials and Methods section for de- 
tails). An arbitrary level of significance (FDR < 0.1) was 
chosen to select results for further analysis. A number of re- 
sults still remained significant even after a stricter cutoff of 
0.05. 

A total of 20 miRNA families were found to be signifi- 
cantly over-represented at that FDR cutoff of 0.1 in one or 
more of the REST-target gene datasets (Table 1). In 14 of 15 
cell lines, enrichment was found (no significant results in cell 
line A549) and 50% of the miRNA families were detected 
more than once. miR-138/138ab, miR-129-5p/129ab-5p, 



the miR-185 family, miR-218/218a, miR-153 and miR- 
448/448-3p were found significantly over-represented in 6, 
8, 8, 10, 12 and 13 of the 15 samples, respectively. 

To assess whether miRNA families could have been 
found to be enriched in many of the samples simply due to 
the number of common genes in the tested REST-bound 
gene lists of each cell type, similarity between all pairs 
of samples according to the Jaccard-index was calculated 
based on: (i) genes present in the gene list (and TargetScan- 
Human) or (ii) miRNAs found enriched for each cell type 
(Figure 3). 

Clustering for Jaccard-indices generated from gene 
counts resulted in separation of the cell lines mostly into 
neural and non-neural (Figure 3a, Supplementary Table 
S6). Cluster I contains non-neural cell lines with the excep- 
tion of cell line U87, which is a glioblastoma cell line. Clus- 
ter II includes HI -neurons, SK-N-SH neuroblastoma and 
PFSK-1 cerebral brain tumor. 

In contrast, the clustering for Jaccard-indices generated 
from miRNA counts reveals a cluster of eight highly sim- 
ilar cell lines (cluster III), which includes both neural and 
non-neural cell lines, with other two clusters (IV and V) that 
clearly separate away (Figure 3b). This result is very differ- 
ent from the clustering obtained using the gene counts. For 
example, the similarity between cell lines HepG2 and K562 
is rather low according to the Jaccard-index by genes (0.33) 
but is very high by miRNAs (0.80). Thus, our finding that 
some miRNAs are found enriched in many cell types cannot 
only be explained by common genes in the test sets. This re- 
sult indicates that our analysis uncovers miRNAs that show 
a remarkable overlap of targets with REST in different con- 
ditions. 

We furthermore tested whether each of the 20 miRN As 
has an enrichment of REST-bound targets (according to 
ChlP-seq) in their set of targets (see the Materials and 
Methods section). A total of 11 miRNA families out of 
the 20 have a significant enrichment (Bonferroni adjusted p- 
value < 0.05; see the Materials and Methods section). The 
complete set of miRNAs was significantly less associated 
with REST (43 of 153 miRNAs; /rvalue = 0.006; see Sup- 
plementary Table S8). 

Filtering associations between over-represented miRNAs and 
REST-bound genes 

Our results are based on 8438 associations of the 20 signifi- 
cantly enriched miRNA families with 3814 predicted target 
genes that are also predicted REST targets (Supplementary 
Table S3, Figure 4). We hypothesize that such results reflect 
the existence of groups of genes that require coordinated 
repression at pre- and post-transcriptional regulatory levels 
and therefore point to high-confidence miRNA-target pre- 
dictions. To demonstrate this point we tested whether these 
miRNA-target associations were significantly enriched in 
experimentally proven interactions. Such a test is challeng- 
ing for several reasons: (i) the largest repository of exper- 
imentally validated miRNA targets, TarBase 6.0 (21), con- 
tains information about just 4.2% of the TargetScanHuman 
6.2 miRNA-gene pairs; (ii) twelve of the 20 over-represented 
miRNA families have no validated target genes and a fur- 
ther three have less than 10. 
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Figure 3. Heatmaps of Jaccard-indices comparing the similarity between 15 cell types, (a) Jaccard-indices comparing the similarity in the sets of genes 
potentially regulated by REST in each of 15 samples, (b) Jaccard-indices comparing the similarity in the sets of detected over-represented miRNA families 
in each sample (A549 not included as there was no over-represented miRNA in this sample). Cell types of neural origin are represented using boldface 
font. 



In Table 2, we present the proportions of experimen- 
tally validated targets of miRNA families in the total Tar- 
getScanHuman dataset and in the subset of REST-bound 
genes for the miRNA families with more than 10 validated 
interactions. Enrichment, if modest, was observed in four 
out of five cases, and was mildly significant when consider- 
ing the data collectively (^-value = 0.07). miR-218/218ab, 
the only miRNA family that was over-represented in many 
cell types (10 of 15), yielded a 1.25-fold significant en- 
richment (p-value = 0.028). It is likely that the selection 
of miRNA-target predictions works better with miRNAs 
over-represented in more than one tissue, but this cannot 
be stated with certainty until more experimentally validated 
targets become available. 

Over-represented miRNAs and REST regulation 

To further investigate the possible functional relation of the 
set of 20 enriched miRNAs with REST, we contrasted them 
with a list of 40 REST-regulated miRNAs (23), of which 
22 were present in our set of TargetScanHuman miRNAs. 
Six of the 20 miRNAs, e.g. miR-129-2, miR-330 and miR- 
153, are known or predicted to be REST-regulated miR- 
NAs (Table 3). This enrichment was significant (/>-value 
= 0.044; Fisher's exact test). Furthermore, we examined 
whether REST was binding nearby the regions coding the 
20 miRNAs (10 kb up- and downstream; see the Materials 
and Methods section for details). REST binding (suggesting 
possible regulation) was found near 16 of the 20 enriched 
miRNA families (Table 3), though this association was not 
significant in comparison to other miRNAs (p-value = 1.0). 
In summary, we found evidence that REST could be re- 
pressing a significant amount of the over-represented miR- 
NAs, suggesting that REST turns off these miRNAs when- 
ever it is present. 

According to Mangan and Alon (24), the three node net- 
work formed by REST, a REST regulated miRNA, and 
their common target is described as an incoherent feed- 



forward loop of type 2 (containing only repressive rela- 
tions). This network motif is very rare in comparison to 
other motifs. Its biological function is not well understood 
and differs depending on system and input signals (see (25) 
for details). 

Over-represented miRNAs and neural function 

Since REST has a role as neural repressor, we wondered 
to what extent the 20 enriched miRNAs targeting REST- 
regulated genes were related to a neural function. The atlas 
on mammalian miRNA expression (20) contains tissue ex- 
pression data for 16 of the 20 over-represented miRNAs. 
According to the atlas, 9 of the 16 tested miRNA families 
are expressed in neural tissue in human (Table 3; enrichment 
/>-value = 0.05). More specifically, we observed an enrich- 
ment of the over-represented miRNA families in adult hip- 
pocampus (^-value = 0.025, see the Materials and Methods 
section for details). 

Some of these miRNAs have well-known functions in 
neural tissue. For example, miR-153 is specific to human 
brain ((26), Table 3). It plays a role in the neuro-pathological 
conditions of Parkinsons and Alzheimer's disease (27,28). 
miRNA-138 is expressed in parts of the brain, and in a feed- 
back loop with SIRT1 it controls axon regeneration (29). 
Erroneous expression is implicated in panic disorder (30) 
and various cancer types (31,32). 

These evidence collectively suggest a mechanism by 
which REST represses a broad set of genes in non-neural tis- 
sues and miRNAs fine-tune their expression in tissues where 
REST is absent (e.g. in neural tissues). 

The role of over-represented miRNAs in glioblastoma 

Expression profiles of miRN As in glioblastoma have been 
studied extensively throughout the past two years. More 
than 28 miRNAs were suggested to have tumor suppress- 
ing properties in glioma or glioblastoma (see Supplemen- 
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Figure 4. Graph depicting the network of genes redundantly regulated both by REST and by the 20 over-represented miRNA families. Small circles in 
the center represent genes (yellow: neural function, pink: any other function). Lines between genes and REST (right) indicate the number of tissues in 
which REST was found close to the particular gene by ChlP-seq (color hue from light green to black). Circles on the left represent the 20 miRNAs (red 
border: involvement in glioblastoma) and are colored in hue from light green to black according to the number of tissues where they were found to be 
over-represented. Connections between miRNAs and their predicted targets are shown in gray color. Curves between REST and the miRNAs going around 
the top of the figure indicate regulation of the miRNAs by REST (known, yellow, or deduced from ChlP-seq data, pink). Possible regulation of REST by 
miR-153, miR-217 and miR-448 predicted by TargetScanHuman 6.2 is indicated by green edges. Direct relations between REST, the miRNAs and PIK3R1 
are shown with red edges. miRNAs were sorted according to hierarchical clustering with respect to their connections to genes. miRNA family names are 
shortened to the first member of each family. 



Table 2. Significance of enrichment in miRNA-target interactions in filtered subset 



Proportion 











Filtered 


Proportion 


valid 












Validated 


Filtered 


validated 


valid all 


filtered 


Fold en- 




Times 


miRNA family 


All pairs 


pairs 3 


pairs 


pairs 


(%) 


(%) 


richment 


value 


observed b 


miR-101/101ab 


804 


65 


635 


50 


8.08 


7.87 


0.97 


0.726 




miR-132/212/212-3p 


407 


25 


332 


21 


6.14 


6.33 


1.03 


0.498 




miR-218/218a 


931 


16 


746 


16 


1.72 


2.14 


1.25 


0.028 


10 


miR-34 family 


680 


43 


500 


36 


6.32 


7.20 


1.14 


0.078 




miR-374ab 


656 


11 


530 


11 


1.68 


2.08 


1.24 


0.094 




merged data 


3478 


160 


2743 


134 


4.60 


4.89 


1.06 


0.071 





a Pairs of associations between miRNAs and genes experimentally validated according to Tarbase 6.0. 
b Number of tissues where the miRNA was found to be enriched (Table 1). 
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Table 3. Regulation of filtered miRNA families (Table 1) by REST 



REST targets 





Johnson and Buckley 


Samples with ChIP 


Expressed in neural 


Glioma /glioblastoma 


miRNA family 


(23) 


signal 3 


tissue b 


suppressor (gs) c 


miR-101/101ab 




5 






miR-129-5p/129ab-5p 


Known 


15 


d 


gs 


miR-132/212/212-3p 


Known 


15 


e,d 




miR-138/138ab 




6 


d 


gs 


miR-139-5p 


Known 


14 


e,s,d 




miR-153 


Likely 6 


8 


s,d 


gs 


miR-185 family 




1 






miR-190/190ab 




3 






miR-208 family 




0 


n.a. 




miR-217 




1 






miR-218/218a 


Likely 6 


1 


d 


as 


miR-300/381/539-3p 




1 


n.a. 




miR-326/330/330-5p 


miR-330 known 


13 




gs 


miR-329 family 




1 






miR-34 family 




2 


d 


gs 


miR-374ab 




7 






miR-421 




0 


d 




miR-448/448-3p 




0 


n.a. 




miR-499-5p 




0 


d 




miR-543 




2 


n.a. 





a Supplementary Table S4 contains a detailed list of the miRNAs that were bound by REST in a certain cell type according to the ChlP-seq data. 

b miRNAs are defined as detected (d) if non-cancerous neural tissue copy count was more than 3% of total counts for all tissues in (20), and specific (s) or 

enriched (e) in brain according to (26). 

c Details are available in Supplementary Table S5. 

d According to (23) and (45). 

e miR-153 and miR-218 are in the introns of a REST regulated gene. Liang et at. found that 77% of tested intronic miRNAs are co-expressed with their 
host genes (46). 



tary Table S5). Although these studies cannot be compared 
easily due to usage of different non-neoplastic references 
and experimental setups (33), we found that four (miR-129- 
5p/129ab-5p, miR-138, miR-153 and miR-218 /218a) out of 
six miRNAs over-represented in many tissues (>5, Table 1) 
seem to function as tumor suppressors in glioblastoma (Ta- 
ble 3; Supplementary Table S5), corresponding to an enrich- 
ment with a ^-value of 0.01 1. 

These results led us to wonder whether the remaining 
miRNAs (miR-185 family andmiR-448/448-3p) could also 
be involved in glioblastoma. Notably, miR-448, our most 
over-represented miRNA (in 13 of 15 cell lines), is not 
known to be expressed in neural tissue. Instead, it has been 
reported to be involved in differentiation of adipocytes by 
targeting KLF5 (34) and to be part of an inhibitory feed- 
back loop with NF-kB in breast cancer cells (35). Its func- 
tion remains to be clarified. 



Verification of the effect of miR-448 on the 3'UTR of 
PIK3R1 

Looking for a way to assess whether the miR-185 family 
or miR-448/448-3p could be involved in glioblastoma and 
how they could have an associated phenotypic impact, we 
searched our list of filtered predictions (Supplementary Ta- 
ble S3) for genes related to glioblastoma with binding sites 
for these two miRNAs. 

PIK3R1 is an oncogene relevant to proliferation and in- 
vasiveness of glioblastoma multiform cells (36) and was pre- 
dicted to be targeted by miR-448 as well as by the other two 
most enriched miRNAs, miR-153 and miR-218/218ab (Ta- 



ble 1 ), which are already known to be glioblastoma suppres- 
sors. Therefore, we decided to test the effect of miR-448 on 
the 3'UTR of this oncogene. 

Two approximately 850 bp long regions of the 3'UTR of 
the PI3KR1 (Figure 5 a) were cloned into a dual reporter 
plasmid. The action of the miRNA-448 was tested on both 
regions. Region B shows a clear impact upon UTR depen- 
dent gene regulation of the reporter when exposed to miR- 
448 (Figure 5b). If we perform the same experiment with a 
version of the construct with a deleted putative binding site 
for the miR-448, the effect is completely abolished (del-B). 
Region A does not respond to the miRNA-448 in our assay. 
In summary, PIK3R1 expression can be down-regulated by 
miR-448 in vitro by targeting Region B. 

Therefore, the enrichment analysis identified parts of a 
regulatory network with both REST and miR-448 as possi- 
ble alternative regulators of PIK3R1. Since PIK3R1 is pre- 
dicted to be the target of another two miRNAs that are tu- 
mor suppressors, we suggest that miR-448 could also be a 
tumor suppressor. Whether miR-448 has a true regulatory 
function in vivo and particularly in glioblastoma remains to 
be clarified. 

In summary, studying the overlap between miRNA tar- 
gets and genes potentially regulated by REST, we obtained 
a network of genes (many of them with neural functions) 
targeted both by REST and by 20 miRNAs (Figure 4). This 
network is enriched in known miRNA targets and in miR- 
NAs with neural related functions. We hypothesize that the 
study of this network provides information on miRNA tar- 
gets and function. 
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Figure 5. miRNA-UTR-Assay. The action of the miRNA-448 was tested 
on two approximately 850 base long regions of the 3'UTR of the PIK3R1 
gene (Region A and Region B). (a) Genomic location of the experimen- 
tally investigated part A and B and the mutated form of part B in relation 
to the 3' UTR of the PIK3R1 gene (thin) and the coding region (CDS) of 
the two downstream exons (thick) on chromosome 5. The putative bind- 
ing sites of the miRNA-448 are indicated by the arrows and small dark 
boxes. Intronic region is given by the thin line, (b) Relative luciferase ac- 
tivity (Renilla/Firefly) in a PIK3R1 3'UTR dependent reporter assay, with 
mean control of Region B set to 1.0. The results indicate a clear action of 
the miRNA-448 on Region B when compared to controls. The effect of the 
miRNA-448 is completely abolished after deletion of the putative binding 
site in Region B (Region B-del). Region A does not show a miRNA-448 
dependent regulation in this assay. Data are presented as mean. The error 
bars indicate standard deviation (SD). 



DISCUSSION 

We have presented here a protocol to study predictions of 
miRNA targets by integration with data on transcription 
repressor binding. The observation that the transcripts of 
genes targeted by a regulatory protein are enriched in tar- 
gets for particular miRNAs can be assessed for significance. 
One possible reason for such enrichment may stem from 
the existence of gene groups that are co-repressed at the 
pre-transcriptional level in one cell type and at the post- 
transcriptional level in another cell type. While these effects 
might not occur simultaneously, the underlying global reg- 
ulatory network may reflect the overlap between levels. 

We applied our approach to 1 5 datasets indicating bind- 
ing sites for the transcriptional repressor REST. Previ- 
ous observations suggested the existence of multiple miR- 
NAs cooperating with TFs in gene regulation (7). Here 
we are rather studying the overlap between the sets of tar- 
gets of a transcriptional repressor, REST, and of partic- 
ular miRNAs. We assessed the existence of such signifi- 



cant overlaps separately for each sample and miRNA. Some 
miRNA families were detected in many of the analyzed cell 
types, indicating reproducibility in different contexts, and 
we showed that such result did not solely depend on reg- 
ulated genes that are common between the cell types. Due 
to the strong repressive effect of REST, we do not expect 
REST and these miRNAs to act on common targets in the 
same condition. It is more likely that REST will be active in 
non-neural tissue and the miRNAs will be active in tissues 
where REST is absent to fine-tune the expression of REST 
targets. 

We collected a set of 20 miRNAs whose targets over- 
lapped significantly with REST targets. The set of pre- 
dicted associations between these 20 miRNAs and REST 
targets was significantly enriched in experimentally proven 
miRNA-target associations and therefore we propose that 
it can be used to expand our knowledge about miRNA tar- 
gets. Additionally, we found that this set of selected targets 
is globally enriched in closely spaced sites (dual sites; 80% 
of randomization tests with />-value < 0.05, see the Mate- 
rials and Methods section). This hints at a mechanism of 
miRNA action by which multiple miRNAs can act together 
with a synergistic effect (37). 

Many of the detected miRNAs have functions related to 
neural identity, which could be expected from their over- 
lap in targets with REST. Four of the miRNAs found 
more often in our study have known tumor suppressing ef- 
fects in glioblastoma, a brain tumor originating from glial 
cells: miR-129-5p/129ab-5p, miR-138, miR-153 and miR- 
218/218a. Interestingly, two independent studies reported 
that elevated expression of REST promotes maintenance of 
self-renewal and oncogenic properties of glioblastoma cells 
(38,39). 

We then used our results to study the function of miR- 
448, which is the miRNA that we found in most samples and 
is currently poorly characterized. We could experimentally 
verify its potential effect on the 3'UTR of the glioblastoma 
oncogene PIK3R1. Since TargetScanHuman 6.2 predicts its 
effect on REST, it becomes an interesting candidate for the 
study of the networks associated with REST and glioblas- 
toma. Collectively, our results suggest that our method can 
be used both to uncover functional fractions of the miRNA 
regulatory network without the need of extensive miRNA 
profiling and to assign putative function to miRNAs. 

We note that our approach provides hints about the 
global underlying regulatory network in a static manner, i.e. 
we provide evidence for high-confidence miRNA-target as- 
sociations, but we cannot indicate whether the predicted re- 
pression of these targets by the corresponding miRNAs will 
happen in one or another tissue. 

Our approach relies on the quality of both the gene bind- 
ing sites and the miRNA binding sites. For example, the ex- 
perimental setup used here for the detection of genes reg- 
ulated by REST does not distinguish different isoforms of 
REST that have tissue-specific patterns of expression, likely 
having different regulatory properties (40). The predictions 
of TargetScanHuman 6.2 correspond to associations of just 
about 11 000 genes and 153 broadly conserved miRNAs. 
These are relatively small numbers, for example in com- 
parison to the number of known transcripts and of known 
human genes. We expect that the number of known miR- 



Nucleic Acids Research, 2014, Vol. 42, No. 9 5445 



NAs and targets will grow with the development of high- 
throughput techniques to measure experimentally RNA- 
protein interactions such as HITS- and PAR-CLIP (41,42). 
This will increasingly allow using experimentally validated 
miRNA binding sites, which will improve the outcome of 
our method. 

TargetScanHuman 6.2 uses 7-nt-long seed sequences to 
scan for miRNA targets and then predicts a target site if 
the seed has a perfect match, or a match of the last six 
nucleotides followed by an (anchoring) adenine. Therefore, 
miRNAs with seeds sharing the last six nucleotides can have 
overlapping pattern matches (Supplementary Figure SI) 
and, as a result, will have much more similar target gene lists 
than by random expectation. The effect cannot be erased 
during the over-representation analysis but should be kept 
in mind when looking at the output. For example, the results 
in Table 1 are not independent for two of the pairs of miR- 
NAs shown in Figure 4. These pairs and miR-101/101ab 
were tested for over-representation again, and this time 
only non-overlapping genes were used. miR-101/101ab and 
miR-448/448-3p remain enriched with FDR < 0.2. miR- 
1 53 was not over-represented in this analysis in any of the 
tissues. Also for the miR-208 family and miR-499-5p, no 
significant result could be obtained (data not shown). This 
means that we cannot tell if one, both or none are truly over- 
represented. 

Here, we have applied our procedure to filter predictions 
of miRNA targets using information on targets of REST. 
It can in principle be performed on any dataset mapping 
the genomic binding sites of a certain transcriptional regu- 
lator, provided that the number of genes guarantees reliable 
statistical results and that a certain amount of common tar- 
gets exist between the regulatory protein and miRNAs. We 
tested the procedure on ChlP-seq datasets of binding sites 
for other regulators than REST in HepG2 cells, including 
transcriptional activators, and miRNA target enrichment 
was often found (Supplementary Table S7). 

Of note, we detected over-representation of miRN A fam- 
ilies for Pol2 bound genes. Pol2 binds DNA but it is ex- 
pected to act neither as a transcriptional repressor nor as 
an activator. This highlights the fact that miRNA enrich- 
ment in 3'UTRs can be due to reasons not directly associ- 
ated with similarity of regulatory effects between miRNAs 
and transcriptional factors. However, we believe that our 
approach will be useful to point to biologically significant 
miRNA targets if applied to gene datasets expected to be 
co-regulated. 

Our approach provides a novel methodology that can 
take advantage of one or more sets of targets of any given 
gene to filter miRNA target predictions on wide gene sets. 
Previous work integrated miRNA and transcriptional regu- 
lation data but following different objectives. For example, 
Tsang et al. searched for co-regulated miRNAs and genes 
using microarray measurements of transcript expression 
(43). Hackl et al. looked for over-representation of miRNA 
binding sites in co-expressed genes and found promising re- 
sults (44). Shalgi et al. did enrichment analysis of miRNA 
binding sites on genes with computationally predicted TF 
binding sites (7). No study has been performed on enrich- 
ment of miRNA binding sites in experimentally determined 



targets (by ChlP-seq or other techniques) of a given regula- 
tory protein. 

In summary, we found a significant overlap between tar- 
gets of REST and the targets of a set of miRNA families. 
The sets of predicted target genes that these miRNAs have 
in common with REST tend to contain a proportion of 
experimentally known miRNA targets higher than in pre- 
dictions without detected REST binding. Furthermore, we 
have shown that our approach can be used to propose novel 
functions for miRNAs from the context of the network 
spanned by REST and the over-represented miRNAs. The 
method can be applied to other transcriptional repressors 
and is expected to improve as miRNA-target predictions 
improve and with the publication of further datasets profil- 
ing genome-wide binding sites of transcriptional regulators 
in multiple cell types. 
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